The following analyses aim to describe the sleep habits of a sample of undergraduate students (N = 82) and to evaluate the role of demographical, dispositional and contextual predictors. Since data are hierarchically structured into two levels (Level 1 = night, Level 2 = student), these goals are pursued using linear mixed-effects regression (LMER) models.
Specifically, the analyses are conducted following a model comparison approach, where the predictors of interest are subsequently included in more parsimonious models, starting from a null model (with no predictors). Each model is compared to the previous one in terms of data reproducibility (likelihood ratio test, LRT) and plausibility (Aikake Information Criterion, AIC).
The analyses are performed separately for each considered sleep measure: total sleep time (TST), sleep efficiency (SE), sleep onset latency (SOL), wake after sleep onset (WASO), bedtime (BT), and wake time, (WT). The predictors of interest are included in the following order:
day of week: differences between weekdays and weekend are analyzed, hypothesizing shorter TST and earlier BT and WT during the weekend
sex: differences between females and males are analyzed, hypothesizing earlier bedtime and longer TST for females
circadian preferences: higher MEQ.r scores are hypothesized to predict earlier bed (BT) and wake time (WT)
perceived sleep disturbances: higher PSQI scores hypothesized to predict lower TST and SE, and longer SOL and WASO
depressive symptoms: higher BDI scores are hypothesized to predict more sleep problems, similarly to the above-mentioned hypothesis
students’ nap frequency: daily daytime nap as recorded with actigraphy using a set of rules modified from Patel and colleagues (2015)
students’ habits: participants reporting to smoke, or to drink coffé and alcohol more frequently are hypothesized to show more sleep problems, similarly to the above-mentioned hypothesis
The document is structured as follows:
the dataset is loaded and converted from a wide form (one row per participant) to a long form (one row per night)
descriptives and intraclass correlation coefficients (ICCs) are computed for each sleep measure
a graphical inspection is performed for all sleep measures
the goodness of fit is assessed for each sleep measure to check the suitability of the normal and alternative distributions
an influential analysis is performed to assess the presence of participants influencing the parameters estimation
LMER models comparison is finally performed per each measure and the results of the selected model are discussed
The first step is to load the dataset and to transform it from a wide form (one row per participant) to a long form (one row per night) to be used with LMER models.
rm(list=ls())
library(tidyr); library(textclean); library(dplyr)
# sleep <- read.csv2("Rechecked BT WT all 26.7.19.csv")
sleep <- read.csv2("Rechecked_nap.csv")
# some numeric variables is erroneously read as factor or character
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.character)
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.numeric)
sleep$AGE <- as.numeric(as.character(sleep$AGE))
sleep$CAFFE_week <- as.numeric(as.character(sleep$CAFFE_week))
sleep$FUMO_Giorno <- as.numeric(as.character(sleep$FUMO_Giorno))
sleep$ALCOL_week <- as.numeric(as.character(sleep$ALCOL_week))
# from wide to long x each variable
sleep.long <- gather(data=sleep,day,BT,BT_LUNEDI:BT_DOMENICA)
sleep.long$WT <- gather(data=sleep,day,WT,WT_LUNEDI:WT_DOMENICA)[,117]
sleep.long$TIB <- gather(data=sleep,day,TIB,TIB_LUNEDI:TIB_DOMENICA)[,117]
sleep.long$TST <- gather(data=sleep,day,TST,TST_LUNEDI:TST_DOMENICA)[,117]
sleep.long$SOL <- gather(data=sleep,day,SOL,SOL_LUNEDI:SOL_DOMENICA)[,117]
sleep.long$WAKE <- gather(data=sleep,day,WAKE,WAKE_LUNEDI:WAKE_DOMENICA)[,117]
sleep.long$SE <- as.numeric(gather(data=sleep,day,SE,SE_LUNEDI:SE_DOMENICA)[,117])
sleep.long$WASO <- gather(data=sleep,day,WASO,WASO_LUNEDI:WASO_DOMENICA)[,117]
sleep.long$NAP <- gather(data=sleep,day,NAP,NAP_LUNEDI:NAP_DOMENICA)[,117]
# converting categorical predictors as factor
sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
"NAP")] <- lapply(sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
"NAP")], factor)
# converting two cases of NAP = 2 into NAP = 1
summary(sleep.long$NAP)
## 0 1 2
## 485 87 2
sleep.long[sleep.long$NAP==2,"NAP"] <- 1
sleep.long$NAP <- as.factor(as.character(sleep.long$NAP))
# recoding days
sleep.long$day <- mgsub(mgsub(sleep.long$day,
c("BT","WT","TIB","TST","SOL","WAKE","SE","WASO","_"),
rep("",9)),
c("LUNEDI","MARTEDI","MERCOLEDI","GIOVEDI","VENERDI","SABATO","DOMENICA"),
c( 1, 2, 3, 4, 5, 6, 7))
sleep.long$day <- as.factor(sleep.long$day)
# order by ID and day
sleep.long <- sleep.long[order(sleep.long$ID,sleep.long$day),]
rownames(sleep.long) <- 1:nrow(sleep.long)
sleep.long[,c("ID","day","BT","WT","TIB","TST","SE","SOL","WAKE","SE","WASO")]
# sanity check
sleep.long[1,"TST"] == sleep[sleep$ID=="EM01","TST_LUNEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="EM04" & sleep.long$day==3,"SE"] == sleep[sleep$ID=="EM04","SE_MERCOLEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="MZ55" & sleep.long$day==5,"WASO"] == sleep[sleep$ID=="MZ55","WASO_VENERDI"]
## [1] TRUE
# take only columns of interest
sleep.long <- sleep.long[,c("ID","day","BT","WT","TIB","TST","SOL","WAKE","SE","WASO","NAP",
colnames(sleep.long)[c(5:23,30:43)])]
Comments:
Then, we create the new variable week.time, with value “wd” for weekdays (Sunday to Thursday) and “we” for weekend (Friday to Saturday).
sleep.long$week.time <- "wd"
sleep.long[sleep.long$day==5|sleep.long$day==6,"week.time"] <- "we"
sleep.long$week.time <- as.factor(sleep.long$week.time)
sleep.long <- sleep.long[,c("ID","day","week.time",
colnames(sleep.long)[3:(ncol(sleep.long)-1)])]
# sanity check
sleep.long[,c("ID","day","week.time")]
Then, we re-codify Wake Time (WT, i.e., the hours between midnight and lights on), which is currently referred to the subsequent day (e.g., Friday WT is Saturday WT, Saturday WT is Sunday WT).
# adjusting wake time (WT) - expressed as hours from 00:00 --> referred to the subsequent day!
WT <- sleep.long[,c("ID","day","WT")]
WT <- WT %>%
group_by(ID) %>%
mutate(WT.lag = dplyr::lead(WT, n = 1))
# make SUNDAY WT as that previously indicated on SATURDAY
IDs <- levels(as.factor(WT$ID))
for(ID in IDs){
WT[WT$ID==ID & WT$day==7,"WT.lag"] <- WT[WT$ID==ID & WT$day==1,"WT"]
}
WT # sanity check
sleep.long$WT.dayAfter <- as.numeric(WT$WT.lag)
mean(na.omit(sleep.long[sleep.long$week.time=="wd","WT.dayAfter"]))
## [1] 8.516297
mean(na.omit(sleep.long[sleep.long$week.time=="we","WT.dayAfter"]))
## [1] 9.398926
Finally, we express TST and TIB in hours, instead of minutes
sleep.long$TST <- sleep.long$TST/60
sleep.long$TIB <- sleep.long$TIB/60
Here the meaning of the variables of interest is explained.
sleep.long
ID is the identification code of each participant
day is the number of the night where sleep measures are recorded in a given participant. It ranges from 1 (Monday) to 7 (Sunday)
week.time indicates if the considered day is a weekday (wd) or weekend (we)
BT is bedtime, encoded as the number of hours between the previous midnight and the time when participants started sleeping
WT is wake time, encoded as the number of hours between the previous midnight and the time when participants woke up
WT.dayAfter is WT referred to the following day
TIB is total time in bed, encoded as the minutes between BT and WT
TST is total sleep time, encoded as the minutes of effective sleep between BT and WT
SE is sleep efficiency, eexpressed as the percentage of sleep time over TIB (SE = 100*TST/TIB)
SOL is sleep onset latency, encoded as the minutes between BT and the first sleep epoch (sleep onset)
WASO is wake after sleep onset, encoded as the total minutes of wake after sleep onset
NAP indicates if a nap was taken (1) or not (0) by a given participant in a given day
SEX is participants sex
CAFFE_week is the average number of coffees consumed by a given participant each week
FUMO..SI.NO. indicates if a given participant is a smoker (N = 24, 29%) or not
ALCOL_WEEK is the average number of alcoholic units consumed by a given participant each week
MEQ.R is the MEQ-r score of a given participant, indicating his/her circadian preferences
BDI.II is the BDI-II score of a given participant, indicating his/her depressive symptoms
PSQI is the PSQI score of a given participant, indicating his/her preceived sleep problems
Here, our goal is to describe the variables of interest with a focus on intra-individual variability. Specifically, we compute the intraclass correlation coefficients (ICCs) to estimate how the variance is distributed on both the within and the between levels. The ICCs range from 0 (all variance is within participants) to 1 (all variance is between participants).
library(ICC); library(Rmisc); library(knitr); library(dplyr); library(ggplot2); library(knitr)
windowsFonts(CMU=windowsFont("CMU Serif Roman"),time=windowsFont("Times New Roman"))
desc.table <- data.frame(measure=as.character(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO")),
mean=rep(NA,7),sd=rep(NA,7),
mean_F=rep(NA,7),sd_F=rep(NA,7),mean_M=rep(NA,7),sd_M=rep(NA,7),
mean_WD=rep(NA,7),sd_WD=rep(NA,7),mean_WE=rep(NA,7),sd_WD=rep(NA,7),ICC=rep(NA,7))
desc.table$measure <- as.character(desc.table$measure)
for(i in 1:nrow(desc.table)){
desc.table[i,2:3] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
na.rm=TRUE)[3:4],2)
desc.table[i,4:5] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="SEX",na.rm=TRUE)[1,3:4],2)
desc.table[i,6:7] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="SEX",na.rm=TRUE)[2,3:4],2)
desc.table[i,8:9] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="week.time",na.rm=TRUE)[1,3:4],2)
desc.table[i,10:11] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
groupvars="week.time",na.rm=TRUE)[2,3:4],2)
data4ICC <- sleep.long[,c("ID",desc.table[i,"measure"])]
desc.table[i,12] <- round(ICCest(x=ID,y=desc.table[i,"measure"],data=na.omit(data4ICC))$'ICC',2)
}
kable(desc.table,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics of level-1 variables")
| measure | mean | sd | mean_F | sd_F | mean_M | sd_M | mean_WD | sd_WD | mean_WE | sd_WD.1 | ICC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BT | 25.19 | 1.72 | 24.96 | 1.64 | 25.46 | 1.79 | 24.99 | 1.55 | 25.68 | 2.02 | 0.38 |
| WT.dayAfter | 8.77 | 1.49 | 8.73 | 1.41 | 8.82 | 1.59 | 8.52 | 1.40 | 9.40 | 1.54 | 0.32 |
| TIB | 7.59 | 1.40 | 7.78 | 1.46 | 7.37 | 1.30 | 7.53 | 1.38 | 7.76 | 1.44 | 0.14 |
| TST | 6.56 | 1.28 | 6.78 | 1.34 | 6.30 | 1.16 | 6.50 | 1.25 | 6.72 | 1.33 | 0.20 |
| SE | 86.41 | 5.77 | 87.12 | 5.22 | 85.58 | 6.26 | 86.34 | 5.78 | 86.56 | 5.74 | 0.64 |
| SOL | 8.82 | 13.48 | 8.45 | 15.46 | 9.25 | 10.72 | 8.91 | 14.27 | 8.60 | 11.33 | 0.24 |
| WASO | 52.98 | 27.27 | 51.56 | 25.26 | 54.66 | 29.41 | 52.69 | 27.81 | 53.71 | 25.96 | 0.53 |
Comments:
descriptive statistics suggest that on average participants fall asleep around 01:00 a.m. and wake up around 8:45 a.m. On average, they spend 7.5 hours in bed, of which only 6.5 hours are effective sleep, with an average sleep efficiency of 86.42%. The average sleep onset latency is 9 minutes and the average wake time after sleep onset is one hour (both measures show high variability between participants).
descriptively, we observe no evident differences between weekdays and weekends, with the exception of TIB and TST.
all ICCs but those of WASO and SE are below .5, indicating that most variability in sleep measures is intra-individual variability, and thus low sleep consistency in the sample. The more stable sleep measures are WASO and SE, whereas the more varying measure, is TIB.
We also create an extended version of this table, showing means and SD by gender and week day.
desc.table2 <- data.frame(measure=rep(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO"),each=2),
sex=rep(c("F","M"),7),
MONmean=rep(NA,14),MONsd=rep(NA,14),TUEmean=rep(NA,14),TUEsd=rep(NA,14),
WEDmean=rep(NA,14),WEDsd=rep(NA,14),THUmean=rep(NA,14),THUsd=rep(NA,14),
FRYmean=rep(NA,14),FRYsd=rep(NA,14),SATmean=rep(NA,14),SATsd=rep(NA,14),
SUNmean=rep(NA,14),SUNsd=rep(NA,14))
for(i in 1:nrow(desc.table2)){
stats <- Rmisc::summarySE(sleep.long,measurevar=as.character(desc.table2[i,"measure"]),
groupvars=c("SEX","day"),na.rm=TRUE)
desc.table2[i,c("MONmean","MONsd")] <- round(stats[stats$day==1&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("TUEmean","TUEsd")] <- round(stats[stats$day==2&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("WEDmean","WEDsd")] <- round(stats[stats$day==3&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("THUmean","THUsd")] <- round(stats[stats$day==4&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("FRYmean","FRYsd")] <- round(stats[stats$day==5&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("SATmean","SATsd")] <- round(stats[stats$day==6&stats$SEX==desc.table2[i,"sex"],4:5],2)
desc.table2[i,c("SUNmean","SUNsd")] <- round(stats[stats$day==7&stats$SEX==desc.table2[i,"sex"],4:5],2)
}
kable(desc.table2,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics by gender and day")
| measure | sex | MONmean | MONsd | TUEmean | TUEsd | WEDmean | WEDsd | THUmean | THUsd | FRYmean | FRYsd | SATmean | SATsd | SUNmean | SUNsd |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BT | F | 24.70 | 1.42 | 24.66 | 1.06 | 24.85 | 1.44 | 24.92 | 1.51 | 25.19 | 1.58 | 25.75 | 2.42 | 24.63 | 1.52 |
| BT | M | 25.09 | 1.47 | 25.16 | 1.41 | 25.37 | 1.74 | 25.16 | 1.36 | 25.81 | 2.11 | 26.06 | 1.82 | 25.55 | 2.26 |
| WT.dayAfter | F | 8.28 | 1.15 | 8.11 | 1.09 | 8.68 | 1.25 | 8.86 | 1.60 | 9.11 | 1.51 | 9.47 | 1.43 | 8.60 | 1.34 |
| WT.dayAfter | M | 8.49 | 1.28 | 8.17 | 1.59 | 8.50 | 1.44 | 8.49 | 1.30 | 9.34 | 1.82 | 9.72 | 1.37 | 8.97 | 1.79 |
| TIB | F | 7.48 | 1.56 | 7.51 | 1.31 | 7.93 | 1.17 | 7.76 | 1.73 | 8.01 | 1.40 | 7.88 | 1.53 | 7.91 | 1.43 |
| TIB | M | 7.38 | 1.24 | 7.03 | 1.13 | 7.10 | 1.18 | 7.38 | 1.35 | 7.47 | 1.25 | 7.63 | 1.55 | 7.60 | 1.34 |
| TST | F | 6.54 | 1.49 | 6.53 | 1.16 | 6.86 | 1.01 | 6.78 | 1.54 | 7.02 | 1.42 | 6.85 | 1.33 | 6.92 | 1.33 |
| TST | M | 6.23 | 1.10 | 6.00 | 1.01 | 6.10 | 0.91 | 6.30 | 1.16 | 6.46 | 1.15 | 6.50 | 1.35 | 6.49 | 1.34 |
| SE | F | 87.14 | 5.22 | 87.06 | 5.21 | 86.58 | 4.52 | 87.28 | 5.05 | 87.29 | 7.00 | 87.08 | 4.56 | 87.39 | 4.89 |
| SE | M | 84.67 | 6.55 | 85.54 | 6.89 | 86.44 | 6.00 | 85.54 | 5.73 | 86.45 | 5.51 | 85.25 | 5.58 | 85.15 | 7.57 |
| SOL | F | 9.05 | 12.18 | 7.98 | 10.48 | 9.41 | 19.69 | 10.14 | 27.77 | 8.16 | 10.53 | 7.66 | 10.23 | 6.80 | 8.14 |
| SOL | M | 8.78 | 8.75 | 8.08 | 7.78 | 7.97 | 9.17 | 10.16 | 11.39 | 8.53 | 14.03 | 10.29 | 10.67 | 10.89 | 12.23 |
| WASO | F | 47.91 | 21.91 | 50.91 | 24.70 | 54.95 | 23.43 | 48.95 | 29.15 | 50.98 | 24.33 | 54.16 | 28.41 | 53.02 | 24.99 |
| WASO | M | 59.81 | 33.55 | 53.76 | 32.51 | 51.78 | 26.64 | 54.37 | 27.03 | 52.55 | 25.30 | 57.50 | 26.05 | 52.87 | 34.80 |
Finally, we can take a look at the variables on level 2 (between-individuals):
gridExtra::grid.arrange(ggplot(sleep,aes(SEX))+geom_bar()+ggtitle("SEX"),
ggplot(sleep,aes(round(AGE,0)))+geom_bar()+ggtitle("AGE"),
ggplot(sleep,aes(FUMO..SI.NO.))+geom_bar()+ggtitle("SMOKE"),
ggplot(sleep,aes(ALCOL..SI.NO.))+geom_bar()+ggtitle("ALCOHOL"))
desc2 <- rbind(
sleep[sleep$CAFFE..SI.NO.==1,] %>%
dplyr::select(CAFFE_week) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep[sleep$FUMO..SI.NO.==1,] %>%
dplyr::select(FUMO_Giorno) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep[sleep$ALCOL..SI.NO.==1,] %>%
dplyr::select(ALCOL_week) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))),
sleep %>%
dplyr::select(c(MEQ.r:PSQI)) %>%
gather("Variable", "value") %>%
group_by(Variable) %>%
summarise(Mean=round(mean(value, na.rm=TRUE),2),
SD=round(sd(value, na.rm=TRUE),2),
min=round(min(value, na.rm=TRUE), 2),
max=round(max(value, na.rm=TRUE),2),
N=length(which(!is.na(value)))))
kable(desc2,digits=2, format="pandoc", caption="Table 2: Descriptive Statistics of level-2 variables")
| Mean | SD | min | max | N |
|---|---|---|---|---|
| 12.43 | 7.46 | 1.0 | 35 | 66 |
| 6.54 | 5.33 | 1.0 | 20 | 24 |
| 4.08 | 3.65 | 0.5 | 15 | 72 |
| 9.84 | 5.92 | 0.0 | 38 | 246 |
gridExtra::grid.arrange(ggplot(sleep,aes(BDI.II))+geom_bar()+ggtitle("BDI-II"),
ggplot(sleep,aes(round(PSQI,0)))+geom_bar()+ggtitle("PSQI"),
ggplot(sleep,aes(MEQ.r))+geom_bar()+ggtitle("MEQ.r"),nrow=3)
ggplot(sleep,aes(x=PSQI,fill=SEX))+
geom_histogram(color="white",position="identity",binwidth = 1)+
geom_vline(xintercept=5.5,lty=2,lwd=1)+
scale_fill_manual(name="Gender",
values=c("lightgray",rgb(.08, .08, .08,alpha=.5)),
labels=c("Women","Men"))+
theme_classic()+labs(x="PSQI scores",y="Frequency")+
theme(text=element_text(size=20,face="bold",family="time"))
Comments:
the sample is gender-balanced, but it is mainly composed by non-smokers (N = 58, 70%) and alcohol users (N = 72, 88%).
the sample is also homogeneous in terms of age, with the exception of one 35-year-old participant (could be an influential case!)
on average, participants consume 12 coffees per week, with smokers reporting to smoke an average number of 6.5 cigarettes per day and alcohol users reporting to drink an average number of 4 alcoholic units per week
the average BDI-II score is 10.22, which is below the suggested cutoff of 13. Only 22 participants (27%) show a score higher than 13.
the average MEQ-r score is 13.23, suggesting that the average participant reports an “intermediate” circadian preference, with 6 (7%) “early birds” (preferring earlier bed and wake time) and 22 (27%) “night owls” (preferring later bed and wake time)
the average PSQI score is 6.06, which is slightly above the suggested cut-off of 5, suggesting the presence of a relatively high number of participants reporting sleep disturbances. Indeed, 44 participants (54%) have a PSQI score higher than 5
Here, the variables’ distributions are further examined by a graphical inspection. A first data is related to sleep consistency and it can be shown in terms of the standard deviation of each considered sleep metric.
library(ggplot2); library(extrafont); library(cowplot); library(Rmisc)
# function for geom_flat_violin
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")
# metrics standard deviations
gridExtra::grid.arrange(ggplot(sleep,aes(x=WT_SD))+
geom_histogram(bins=40) + ggtitle("Wake Time") + xlab("St. Dev") +
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=BT_SD))+
geom_histogram(bins=40) + ggtitle("Bed Time") + xlab("St. Dev") +
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=TIB_SD))+
geom_histogram(bins=40) + ggtitle("Time in Bed") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=TST_SD))+
geom_histogram(bins=40) + ggtitle("Total Sleep Time") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=SOL_SD))+
geom_histogram(bins=40) + ggtitle("Sleep Onset Time") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=SE_SD))+
geom_histogram(bins=40) + ggtitle("Sleep Efficiency") + xlab("St. Dev")+
theme(text=element_text(family="CMU")),
ggplot(sleep,aes(x=WASO_SD))+
geom_histogram(bins=40) + ggtitle("Wake After Sleep Onset") +
xlab("St. Dev")+ theme(text=element_text(family="CMU")),nrow=2)
Comments:
we can observe a degree of variability in each variable, with most cases ranging between 50 and 70 minutes for sleep and wake time, time in bed and total sleep time, and between 10 and 20 minutes for sleep onset latency and wake after sleep onset.
the most consistent variable is sleep efficiency, which shows an average SD around 3%.
Then, we can take a look at the day-by-day variability within each participant. Below, each variable is plotted day-by-day for each participant and considering the whole sample.
sleep.long$day <- as.numeric(sleep.long$day)
# TIB AND TST by participant
ggplot(sleep.long,aes(x=day,y=TIB)) +
geom_smooth(se=FALSE,span=0.7,colour="black") +
geom_point(size=1,colour="black") +
geom_smooth(aes(y=TST),colour="#99c2ff",se=FALSE,span=0.7) + # insoddisfatto
geom_point(aes(y=TST),size=1,colour="#99c2ff") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Time in bed (black) and Total Sleep Time (blue)")+ylab("TST (min)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# TIB AND TST distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=TST,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_flat_violin(aes(y=TIB),fill="gray",position = position_nudge(x = .2, y = 0),adjust =2,size=1.5,alpha=.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = TST, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="TST (min)")+ggtitle("Total Sleep Time and Time in bed (in gray)")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# SE by participants
ggplot(sleep.long,aes(x=day,y=SE)) +
geom_smooth(se=FALSE,span=0.7,colour="black") +
geom_point(size=1,colour="black") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Sleep Efficiency")+ylab("SE (%)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# SE distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=SE,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = SE, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="SE (%)")+ggtitle("Sleep Efficiency")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# WASO and SOL by participant
ggplot(sleep.long,aes(x=day,y=WASO)) +
geom_smooth(se=FALSE,span=0.7,colour="darkred") +
geom_point(size=1,colour="darkred") +
geom_smooth(aes(y=SOL),se=FALSE,span=0.7,colour="salmon") +
geom_point(aes(y=SOL),size=1,colour="salmon") +
geom_vline(aes(xintercept=5.5))+
facet_wrap("ID",strip.position="right") +
ggtitle("Wake After Sleep Onset (dark red) and Sleep Onset Latency (light red)")+ylab("WASO and SOL (min)")+
theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))
# WASO distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WASO,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = WASO, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="WASO (min)")+ggtitle("Wake After Sleep Onset")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# SOL distributions
ggplot(sleep.long[!is.na(sleep.long$SOL)&sleep.long$SOL<100,],aes(x=as.factor(as.character(day)),y=SOL,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = SOL, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="SOL (min)")+ggtitle("Sleep Onset Latency")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# WT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WT,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = WT, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="")+ggtitle("Wake time")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
scale_y_continuous(breaks=c(4,8,12,16),labels=c("04:00","08:00","12:00","16:00"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# BT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=BT-24,fill=as.factor(as.character(day))))+
geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
geom_boxplot(aes(x = as.factor(as.character(day)), y = BT-24, fill = as.factor(as.character(day))),size=1.5,
outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
labs(x="",y="")+ggtitle("Bed time")+
scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
scale_y_continuous(breaks=c(0,5,10),labels=c("00:00","05:00","10:00"))+
theme(text = element_text(family = "CMU",size=18,face="bold"),legend.position = "none")
# mean values (WT.dayAfter is wake time on the day after)
sleep.long$day <- factor(sleep.long$day,levels=c(7,6,5,4,3,2,1))
means <- summarySE(sleep.long,measurevar="WT.dayAfter",groupvars="day",na.rm=TRUE)
means$BT <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,3]
means$BT.sd <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,4]
means$BT.se <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,5]
means <- means[order(means$day,decreasing=TRUE),]
# plotting BT and following WT
ggplot(sleep.long,aes(x=day,y=BT-24))+
geom_flat_violin(adjust =2,size=0,alpha=.5,fill="gray")+
geom_flat_violin(aes(y=WT.dayAfter),
adjust =2,size=0,alpha=.5,fill="gray")+
# geom_boxplot(outlier.shape = NA,width = .1,fill="gray",
# position = position_nudge(x = -.2, y = 0))+
# geom_boxplot(aes(y=WT2),outlier.shape = NA,width = .1,fill="gray",
# position = position_nudge(x = -.2, y = 0))+
geom_segment(data=means,aes(x=day,xend=day,
y=WT.dayAfter,yend=BT-24),size=2,colour="black")+
geom_point(data=means,aes(x=day,y=WT.dayAfter),size=6,shape=21,fill="gray")+
geom_point(data=means,aes(x=day,y=BT-24),size=6,shape=22,fill="gray")+
labs(x="",y="")+
scale_x_discrete(labels=c("SUN","SAT","FRY","THU","WED","TUE","MON"))+
scale_y_continuous(breaks=c(0,2.5,5,7.5,10,12.5),limits=c(-1,11),
labels=c("00:00","02:30","05:00","07:30","10:00","12:30"))+
theme_classic()+
theme(text = element_text(family = "time",face="bold",size=18),legend.position = "top")+
coord_flip()
Comments:
We observe no clear differences between weekdays and weekend for TIB, TST, SE, WASO, and SOL.
In contrast, slightly higher WT and BT values are showed on Saturday compared to other days, suggesting that participants fall asleep later and wake up later on Saturday.
We can notice one outlier (EM39) for WT on day 4 (Thursday).
The latter graph suggest that both Wednesday and Saturday show slightly higher variability compared to other days.
library(lme4); library(jtools); library(effects) # modeling
library(fitdistrplus) ; library(car) ; library(lattice) # gof
In this section, we assess the goodness of fit (GOF) of each model specified in the “LMER MODELS” section. Specifically, we focus on the most complex model speciefied per each sleep measure, and we evaluate (A) the normality of resuduals distribution, (B) the normality of random effects distribution, and (C) the independence of residuals from fitted values (homoscedasticity).
For each sleep metric, we assess the GOF of the most complex model (including all predictors of interest) and we assess the following LMER models’ assumptions:
independence between residuals and fitted values,
the normality of both residuals’ and random effects’ distribution,
that random effects are centered on zero,
Typically, TST shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).
m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
The normal distribution shows an excellent fit on TST’s residuals and a good fit on random effects.
All other LMER assumptions seem to be fulfilled.
Conclusions:
We can model TST under the normality assumption.
SE is bounded between 0 (when TST = 0) and 100 (when TST = TIB). Thus, a beta regression could be the best option, but at the time of these analyses no packages able to deal with mixed-effects beta regression were available (to our knowledge).
In our sample, only 4 on 82 showed an SE lower than 70%: EM07 (night 5), EM22 (night 36), EM36 (night 7) and EM53 (all 7 nights), whereas only one participant showed an SE = 99% and no participants showed a 100% SE.
# SE < 70%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE<70,
c("ID","day","SE","TIB","TST","WASO","SOL")]
mean(sleep.long[sleep.long$ID=="MZ53","SE"])
## [1] 61.77
mean(sleep.long[sleep.long$ID=="MZ53","WASO"])
## [1] 148.2857
# SE = 99%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE==99,
c("ID","day","SE","TIB","TST","WASO","SOL")]
The frequency distribution shows a quite normal shape, which is symmetric and centered approximately on 87-88%. The issue of modeling these data using a normal distribution is that the models would predict values above 100%, whereas TST cannot be higher than TIB (thus, SE cannot be higher than 100%).
hist(sleep.long$SE,breaks=100)
With no clues on how to model the data differently, we accept this limitation and we proceed by evaluating the fit of the normal distribution. Indeed, SE can be seen as a sort of transformation of TST, which can be reasonably modeled assuming residuals normality. SE results should confirm those obtained on TST, and we could focus our goal on assessing the impact of our predictors of interest on SE, instead of predicting meaningful SE values.
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (acceptable)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
The normal distribution does not show a very good fit on SE’s residuals or random effects. This is especially due to the four outliers (EM07, EM22, EM36 and MZ53) showing SE lower than 70% (2+ hours of WASO, 2 to 5 hours of TST).
All other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero).
Let’s see what happens to residuals normality if we exclude the two outliers:
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
sleep.long[sleep.long$ID!="EM07"&sleep.long$ID!="EM22"&sleep.long$ID!="EM36"&sleep.long$ID!="MZ53",]) # excluding outliers
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
We can also try to log-transform the data considering the whole sample.
m <- lmer(log(SE)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long)
# residuals normal fit (worse)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
Conclusions:
Predicting SE scores assuming a normal residuals distribution implies that SE could be above 100% or (with much less probability) below 0%, which cannot be the case. However, with no clues on alternative ways to model our data, the normal distribution seems to be the better way. Looking at the data, we evaluate the normal fit on SE residuals as not excellent but at least acceptable. Removing four outliers would improve the fit.
SOL is a variable bounded on zero, typically skewed on the right, with few cases showing more than 1 hour of latency. Thus, we can try to log transform it to fit a model under normality, or we might use the Gamma family, which is bounded on zero.
# normal fit on log-transformed SOL (a constant is added to avoid cases of log(0))
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable?)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Res. vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the fit of the normal distribution is not very good but acceptable
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)
Then, we check the fit of the Gamma family.
m <- glmer(SOL+.05~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long,
family="Gamma")
plot(fitdist(resid(m),distr="norm",method="mle"))
Conclusions:
The fit of the Gamma family on raw SOL data is worse than that of the normal distribution on log-transformed data. The best option seems to log-transform.
As TST, WASO usually shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the normal distribution shows a good fit on WASO’s residuals and a good fit on random effects
all other LMER assumptions seem to be fulfilled
Conclusions:
We can model WASO under the normality assumption.
We expect to observe a symmetric shape also for BT.
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
# outliers
sleep.long[!is.na(sleep.long$BT)&sleep.long$BT>30,c("ID","day","BT","WT")]
Comments:
the fit of the normal distribution is not very good but acceptable
we can observe some outliers (especially EM22 and MZ36) showed BT higher than 34
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)
Let’s see what happens if we exclude the two outliers:
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
sleep.long[sleep.long$ID!="EM22"&sleep.long$ID!="MZ36",]) # excluding outliers
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))
Comments:
Also a log transformation slighlty improves the normal fit, but the improvement is not substantial.
m <- lmer(log(BT)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))
Conclusions:
The normal fit on BT is not excellent but at least acceptable, removing two outliers seems to improve the fit. There is no evident reason to log transform the data.
We expect to observe a symmetric shape also for WT.
m <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))
# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))
# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
qqmath(residuals(m),main="Residual QQPlot"),
dotplot(ranef(m))[[1]],
plot(ranef(m),main="Random Effects QQplot")[[1]],
nrow=2)
Comments:
the normal distribution fits well both residuals and random effects for WT
all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are centered on zero)
Conclusions:
We can model WT under the normality assumption.
In this section, we perform an influential analysis by using the Cook’s distance (CD) (that indicates, for each observation i, the distance between the parameters vector of the model M and that of M-i. High CD values indicate that the observation has a strong influence on the parameters estimated by the model. A cut-off of 4/N can be used to identify the most influential cases based on the CD.
For each sleep measure, we use the most complex model (with all predictors of interest) to proceed as follows:
we compute the Cook’s distance by considering a cut-off set to 4/N,
we re-estimate the parameters by removing all potentially influent cases, one-by-one,
if a given case shows to substantially influence the parameters estimation, we replicate the influential analysis without that case,
finally, we decide which cases can be removed,
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","TST","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting Cook's distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM06" "EM07" "EM09" "EM22" "EM23" "EM36" "EM42" "em51" "EM53" "MZ04"
## [11] "MZ08" "MZ22" "MZ25" "MZ34" "MZ39" "MZ40" "MZ44" "MZ46" "MZ50" "MZ55"
## [21] "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=12|parameters$est>=14),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=12|parameters$est>=14),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameters estimated for week.time (ranging from 12 to 16, with t-values ranging from 2.0 to 2.4), MEQ-R (ranging from -1.25 to 0.00, with t-values ranging from -1.00 to 0.00, and MZ44 and MZ56 leading to an average decrease of .4 in the estimated parameter), ALCOHOL (ranging from -1.6 to -.2, with t-values ranging from -1.0 to 0.0, and MZ44 leading to an absolute increase of .75 in the estimated parameter), SMOKE (ranging from -15 to -8, with t-values ranging from -1.4 to -.8, and EM06, EM22 and EM36 leading to an average absolute increase of .2 in the estimated parameter).
EM22 and EM36 are among the most influential cases. Interestingly, they are also two of the four outliers found for SE.
None of the potentially influential cases seems to substantially change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in TST analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","SE","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM02" "EM04" "EM07" "EM08" "EM10" "EM12" "EM16" "EM22" "EM23" "EM29"
## [11] "EM36" "EM39" "EM54" "MZ03" "MZ04" "MZ05" "MZ09" "MZ15" "MZ21" "MZ25"
## [21] "MZ30" "MZ35" "MZ36" "MZ38" "MZ53" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- data2check[data2check$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameter estimated for NAP (ranging from -.6 to -1.1, with t-values ranging from -1.6 to -2.3, and EM07 showing the strongest influence, leading to an increase of .4 in the estimated NAP coefficient). Further differences are shown by the parameters estimated for sex (ranging from -2.0 to -1.0, with t-values ranging from -1.6 to -1.0, and MZ53 leading to an absolute increase of .75 in the estimated parameter), and smoke (ranging from -3 to -1.5, with t-values ranging from -2.5 to -1.5, and MZ53 leading to an absolute increase of 1.0 in the estimated parameter).
MZ53 shows a degree of influence on almost all parameters, although the interpretation of the results does not change substantially (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The only exception is smoke, whose effect seems to strongly depend on MZ53.
MZ53 is also one of the four outliers found for SE, with all nights showing an SE < .70.
Thus, we replicate the influential analysis by removing EM07 and MZ53. We start with the former.
OUT <- data2check[data2check$ID!="EM07",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of EM07 decreases the effect of NAP and makes the estimated parameter more consistent.
Participant MZ53 continues to show a degree of influence on many parameters, and particularly on smoke.
Thus, we replicate the influential analysis by removing MZ53.
OUT <- OUT[OUT$ID!="MZ53",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of MZ53 decreases the effect of smoke and makes the estimated parameter more consistent. None of the other parameters show to be substantially affected by the removal of this participant.
We do not observe other potentially influential cases.
Conclusions:
We decide to perform the analyses on SE without EM07 and MZ53.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SOL)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","SOL","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM04" "EM05" "EM08" "EM09" "EM12" "EM18" "EM23" "EM24" "EM26" "EM44"
## [11] "EM53" "EM54" "MZ02" "MZ04" "MZ05" "MZ14" "MZ15" "MZ21" "MZ22" "MZ23"
## [21] "MZ31" "MZ35" "MZ36" "MZ39" "MZ47" "MZ49" "MZ50" "MZ51" "MZ55" "MZ57"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- data2check[data2check$ID!=ID,]
m4 <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
None of the estimated parameters seems to be substantially affected by potentially influential cases.
Two participants show stronger influence on several parameters: MZ15 for sex, MEQ-r, PSQI, and BDI; MZ23 for alcohol and coffee. However, also these participants do not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in SOL analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WASO)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","WASO","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM04" "EM07" "EM09" "EM10" "EM12" "EM17" "EM22" "EM29" "EM38" "EM39"
## [11] "EM50" "EM53" "EM54" "EM55" "MZ03" "MZ16" "MZ17" "MZ25" "MZ30" "MZ34"
## [21] "MZ36" "MZ42" "MZ46" "MZ49" "MZ53" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Larger differences are shown by the parameter estimated for smoke (ranging from 7 to 13, with t-values ranging from 1.4 to 2.2), whose effect is strongly influenced by MZ53 (leading to an increase of 4 in the estimated smoke coefficient).
MZ53 is the participant influencing more coefficients (week.time, sex, MEQ-r, PSQI, alcohol, and smoke), followed by EM54 (affecting the parameter for alcohol), MZ16 (influencing the effect of coffee) and MZ25 (influencing the effect of BDI-II). MZ53 is the most influential case (this participant is also one of the outliers for SE).
Thus, we replicate the influential analysis by removing MZ53.
OUT <- data2check[data2check$ID!="MZ53",] # excluding influential case
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- OUT[OUT$ID!=ID,]
m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
Comments:
The exclusion of MZ53 decreases and makes more consistent the parameter estimated for smoke. Slight changes are observable also for other parameters.
There are still some participants showing a degree of influence on the estimation of the parameters, especially MZ25 (affecting sex and PSQI), MZ25 (affecting PSQI, BDI-II, and smoke), and MZ36 (affecting sex, alcohol, and smoke). However, the exclusion of these participant does not substantially change the estimated parameters.
Conclusions:
We decide to exclude only **MZ53* from WASo analyses.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$BT)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","BT","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM07" "EM08" "EM09" "EM22" "EM23" "EM29" "EM36" "EM40" "EM42" "EM45"
## [11] "MZ03" "MZ04" "MZ17" "MZ22" "MZ25" "MZ34" "MZ40" "MZ50" "MZ53" "MZ54"
## [21] "MZ55" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# SEX
p1 <- ggplot(parameters[parameters$pred=="SEXM",],aes(ID,est))+
geom_point()+ggtitle("SEX effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="SEXM"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="SEXM" & parameters$est<=.1,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="SEXM",],aes(ID,t))+
geom_point()+ggtitle("SEX effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="SEXM"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="SEXM" & parameters$est<=.1,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# MEQ.r
p1 <- ggplot(parameters[parameters$pred=="MEQ.r",],aes(ID,est))+
geom_point()+ggtitle("MEQ-r effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="MEQ.r"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="MEQ.r" & (parameters$est<=(-.15)|parameters$est>=(-.13)),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="MEQ.r",],aes(ID,t))+
geom_point()+ggtitle("MEQ-r effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="MEQ.r"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="MEQ.r" & (parameters$est<=(-.15)|parameters$est>=(-.13)),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# PSQI
p1 <- ggplot(parameters[parameters$pred=="PSQI",],aes(ID,est))+
geom_point()+ggtitle("PSQI effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="PSQI"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="PSQI" & parameters$est<=(-.02),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="PSQI",],aes(ID,t))+
geom_point()+ggtitle("PSQI effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="PSQI"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="PSQI" & parameters$est<=(-.02),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# BDI-II
p1 <- ggplot(parameters[parameters$pred=="BDI.II",],aes(ID,est))+
geom_point()+ggtitle("BDI-II effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="BDI.II"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="BDI.II" & parameters$est<=(-.01),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="BDI.II",],aes(ID,t))+
geom_point()+ggtitle("BDI-II effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="BDI.II"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="BDI.II" & parameters$est<=(-.01),],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# NAP
p1 <- ggplot(parameters[parameters$pred=="NAP",],aes(ID,est))+
geom_point()+ggtitle("NAP effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="NAP"¶meters$ID=="All",],colour="red")
p2 <- ggplot(parameters[parameters$pred=="NAP",],aes(ID,t))+
geom_point()+ggtitle("NAP effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="NAP"¶meters$ID=="All",],colour="red")
gridExtra::grid.arrange(p1,p2,nrow=2)
# ALCOL
p1 <- ggplot(parameters[parameters$pred=="ALCOL_week",],aes(ID,est))+
geom_point()+ggtitle("ALCOHOL effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="ALCOL_week"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="ALCOL_week" & parameters$est<=.05,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="ALCOL_week",],aes(ID,t))+
geom_point()+ggtitle("ALCOHOL effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="ALCOL_week"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="ALCOL_week" & parameters$est<=.05,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# SMOKE
p1 <- ggplot(parameters[parameters$pred=="FUMO..SI.NO.1",],aes(ID,est))+
geom_point()+ggtitle("SMOKE effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="FUMO..SI.NO.1"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="FUMO..SI.NO.1" & parameters$est<=.35,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="FUMO..SI.NO.1",],aes(ID,t))+
geom_point()+ggtitle("SMOKE effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="FUMO..SI.NO.1"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="FUMO..SI.NO.1" & parameters$est<=.35,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
# COFFEE
p1 <- ggplot(parameters[parameters$pred=="CAFFE_week",],aes(ID,est))+
geom_point()+ggtitle("COFFE effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="CAFFE_week"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="CAFFE_week" & parameters$est>=.009,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="CAFFE_week",],aes(ID,t))+
geom_point()+ggtitle("COFFE effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="CAFFE_week"¶meters$ID=="All",],colour="red")+
geom_text(data=parameters[parameters$pred=="CAFFE_week" & parameters$est>=.009,],
aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
Two participants show stronger influence on several parameters: EM36 for week.time, sex, MEQ-r, PSQI, and smoke; MZ22 for EQr, alcohol and coffee. Alcohol is the mostly influenced parameter, whose magitude is decreased .072 (t = 1.89) to .047 (t = 1.22).
However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).
Conclusions:
We decide to keep all participants in BT data analysis.
# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WT.dayAfter)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
!is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
!is.na(sleep.long$CAFFE_week),
c("ID","day","WT.dayAfter","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)
# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N
# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm
# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
" ( NInfl =",nrow(pm[pm$infl==1,]),")"),
pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
pm[pm$cd>=cutoff,"cd"]-0.005,
paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
lwd=1.8,col="blue",lty=2)
# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
## [1] "EM01" "EM07" "EM08" "EM09" "EM17" "EM22" "EM30" "EM36" "EM39" "EM47"
## [11] "em51" "MZ03" "MZ04" "MZ08" "MZ17" "MZ22" "MZ25" "MZ39" "MZ40" "MZ49"
## [21] "MZ54" "MZ56"
Comments:
In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.
# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
new.data <- PM[PM$ID!=ID,]
m4 <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
new.data <- as.data.frame(round(summary(m4)$coefficients,3))
new.data$ID <- ID
parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
"ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
geom_point(data=parameters[parameters$pred=="week.timewe"¶meters$ID=="All",],colour="red")+
theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)
Comments:
None of the estimated parameters seems to be substantially affected by potentially influential cases.
MZ22 is the participant influencing more parameters, and particoularly MEQ-r, alcohol, smoke, and coffee comsumption. However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The strongest influence of this participant is on alcohol assumption, whose estimated parameter changes from .05 (t = 1.82) to .03 (t = 1.08) after the removal of MZ22.
infl <- influence.ME::influence(m,"ID")
cd <- influence.ME::cooks.distance.estex(infl, parameter=8, sort=TRUE)[,1]
cd[which(names(cd)=="MZ22")]
## MZ22
## 0.5399679
4/nlevels(data2check$ID) # cutoff
## [1] 0.04878049
Conclusions:
We decide to keep all participants in WT data analysis.
In this section, we use the information from previous sections to specify a set of LMER models for each sleep metric of interest. As mentioned above, we proceed with a hierarchical regression by including the predictors of interest one-by-one, and we perform a model comparison starting from a null model (with no predictors) until the model including all predictors. Each model is compared to the previous one using the likelihood ratio test (LRT) and the Aikake Information Criterion (AIC).
library(lme4); library(effects); library(MuMIn); library(arm) # modeling
library(gridExtra); library(jtools); library(knitr); library(sjPlot) # outputs
options(knitr.kable.NA = '')
For each sleep measure:
We specify the models including the predictors in the order mentioned above.
The set of models is precessed using the LRT and the AIC. The AIC weight (AICw) is used to express AIC in a probabilistic way (i.e., each model is associated with a probability, from 0 to 1, to be the most plausible model), and the evidence ratio (dAIC) is used to assess how much each model is less plausible than the selected model.
Finally, the outcomes of the selected model are assessed.
TST is modeled assuming a normal distribution for residuals and including all participants (see previous sections).
# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
# null model
null <- lmer(TST~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(TST~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(TST~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(TST~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(TST~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
The model including the effect of week.time and that including also sex are the only models associated to a significantly higher logLikelihood compared to the null model for TST (Chisq(1) = 10.05, p = .001).
The model including only sex and week.time is also the model with the lowest AIC (AICw = -.57), followed by the model including MEQ.r (AICw = .21). The dAIC suggests that the former is about 2.5 times more plausible than the latter.
Now, let’s proceed by evaluating the output of the selected model.
summary(SEX)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: TST ~ week.time + SEX + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 1852.3 1874.0 -921.1 1842.3 566
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3789 -0.5449 0.0632 0.5636 3.8587
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.2665 0.5162
## Residual 1.2983 1.1394
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.7188 0.1057 63.549
## week.timewe 0.2305 0.1054 2.187
## SEXM -0.4874 0.1491 -3.269
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw
## week.timewe -0.285
## SEXM -0.651 -0.003
Comments:
Below, the effects are plotted and a summary table is generated.
# effect plot
plot(allEffects(SEX))
# output table
TST <- cbind(Model=c("Intercept","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
round(LRT[,c("Chisq","Pr(>Chisq)")],2),
A[order(A$df),],
rbind(round(coef(summary(SEX)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
rownames(TST) <- rownames(LRT)
kable(TST,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 3 | 1863.03 | 0.00 | 219.20 | 6.72 | 0.11 | 63.55 | ||
| Week time | 4.73 | 0.03 | 4 | 1860.30 | 0.01 | 55.98 | 0.23 | 0.11 | 2.19 |
| Sex (males) | 10.05 | 0.00 | 5 | 1852.25 | 0.57 | 1.00 | -0.49 | 0.15 | -3.27 |
| MEQ-r | 0.04 | 0.84 | 6 | 1854.22 | 0.21 | 2.68 | |||
| PSQI | 0.17 | 0.68 | 7 | 1856.04 | 0.09 | 6.65 | |||
| BDI | 1.69 | 0.19 | 8 | 1856.35 | 0.07 | 7.77 | |||
| NAP | 0.09 | 0.76 | 9 | 1858.26 | 0.03 | 20.19 | |||
| substances | 5.40 | 0.14 | 12 | 1858.86 | 0.02 | 27.25 |
Finally, we compute 95% Bootstrap confidence intervals around the Intercept of the null model, to have a general idea of the sample average TST. We do not use the intercept of the selected model since it is related to specific levels of the included predictors (i.e., females during weekdays).
# bootsrap confidence intervals
ci <- lme4::confint.merMod(null,method="boot",nsim=1500,level=0.95)
ci <- as.data.frame(ci[3:nrow(ci),])
ci # intercept CI (minutes)
ci/60 # intercept CI (hours)
Conclusions:
Total sleep duration was predicted only by participants’ sex, with males showing a shorter duration than females. On average, participants sleep between 6:29 and 6:43 hours.
SE is modeled assuming a normal distribution for residuals and excluding participant EM07 and MZ53 (see previous sections).
# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r)&
sleep.long$ID!="EM07" & sleep.long$ID!="MZ53",]
# null model
null <- lmer(SE~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(SE~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(SE~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(SE~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
None of the specified models shows a significantly higher logLikelihood than the null model.
Consistently, the null model shows the lowest AIC (AICw = .37), with model 1 (including only week.time) showing the second lowest AIC (AICw = .25, dAIC = 1.50). The dAIC suggests that the null model is about 2.5 times more plausible than model 1.
# output table
SE <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
round(LRT[,c("Chisq","Pr(>Chisq)")],2),
A[order(A$df),],
rbind(round(coef(summary(null)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(SE,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Null | 3 | 3051.75 | 0.37 | 1.00 | 86.84 | 0.43 | 201.66 | ||
| Week time | 1.19 | 0.28 | 4 | 3052.56 | 0.25 | 1.50 | |||
| Sex (males) | 1.57 | 0.21 | 5 | 3052.99 | 0.20 | 1.86 | |||
| MEQ-r | 0.02 | 0.89 | 6 | 3054.97 | 0.07 | 5.00 | |||
| PSQI | 0.98 | 0.32 | 7 | 3055.98 | 0.04 | 8.29 | |||
| BDI | 0.83 | 0.36 | 8 | 3057.16 | 0.02 | 14.95 | |||
| NAP | 1.97 | 0.16 | 9 | 3057.19 | 0.02 | 15.18 | |||
| substances | 4.55 | 0.21 | 12 | 3058.63 | 0.01 | 31.19 |
Let’s see what the results would have been if EM07 and MZ53 were not excluded.
# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
# null model
null <- lmer(SE~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(SE~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(SE~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(SE~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
summary(sub)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## SE ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 3254.4 3306.5 -1615.2 3230.4 559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.9886 -0.4386 0.0794 0.5408 3.0929
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 17.95 4.236
## Residual 11.79 3.434
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 91.039003 2.490126 36.560
## week.timewe 0.166186 0.318242 0.522
## SEXM -1.786910 1.111971 -1.607
## MEQ.r -0.129717 0.143768 -0.902
## PSQI -0.003726 0.219586 -0.017
## BDI.II -0.044591 0.078835 -0.566
## NAP1 -0.908227 0.453040 -2.005
## ALCOL_week 0.122889 0.168164 0.731
## FUMO..SI.NO.1 -2.611636 1.182318 -2.209
## CAFFE_week -0.117700 0.066067 -1.782
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.038
## SEXM -0.212 -0.001
## MEQ.r -0.809 0.000 0.028
## PSQI -0.403 -0.001 0.151 0.040
## BDI.II -0.124 0.001 -0.149 0.140 -0.547
## NAP1 -0.020 0.056 0.020 -0.001 0.001 -0.017
## ALCOL_week -0.031 0.000 -0.445 0.026 -0.130 0.185 -0.010
## FUMO..SI.NO -0.168 -0.001 0.044 0.163 0.062 -0.095 -0.019 -0.291
## CAFFE_week -0.042 -0.001 0.197 -0.212 0.028 -0.054 -0.017 -0.332 -0.134
Comments:
Conclusions:
None of the included predictors exerted a significant effect on SE. The effect of NAP was due to participant EM07, and the effect of Smoke was due to participant MZ53.
SOL is log-transformed and modeled assuming a normal distribution for residuals. All participants are included in the analysis (see previous sections).
# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$SOL)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
# null model
null <- lmer(log(SOL+.05)~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(log(SOL+.05)~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(log(SOL+.05)~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
The models including the effect of BDI-II and substances are the only models associated with a significantly higher logLikelihood compared to the null model for SE.
Coherently, the last model shows the lowest AIC (AICw = .44), with the model with BDI-II showing the second lowest AIC (AICw = .17). The dAIC suggests that the former is about 2.64 times more plausible than the latter.
Now, let’s proceed by evaluating the output of the selected model.
summary(sub)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## log(SOL + 0.05) ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP +
## ALCOL_week + FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 2216.6 2268.7 -1096.3 2192.6 559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0690 -0.3713 0.1379 0.6418 2.1937
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.5539 0.7442
## Residual 2.3710 1.5398
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.39892 0.53359 2.622
## week.timewe -0.04886 0.14265 -0.342
## SEXM 0.60406 0.23763 2.542
## MEQ.r -0.04752 0.03069 -1.548
## PSQI -0.03439 0.04686 -0.734
## BDI.II 0.03629 0.01684 2.155
## NAP1 0.23762 0.19455 1.221
## ALCOL_week -0.08853 0.03588 -2.467
## FUMO..SI.NO.1 -0.13386 0.25244 -0.530
## CAFFE_week 0.03042 0.01409 2.158
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.079
## SEXM -0.210 -0.001
## MEQ.r -0.807 0.000 0.027
## PSQI -0.403 -0.001 0.152 0.041
## BDI.II -0.124 0.001 -0.152 0.141 -0.546
## NAP1 -0.040 0.054 0.041 -0.001 0.003 -0.034
## ALCOL_week -0.031 0.001 -0.446 0.026 -0.130 0.187 -0.021
## FUMO..SI.NO -0.169 -0.002 0.043 0.164 0.063 -0.094 -0.038 -0.290
## CAFFE_week -0.041 -0.002 0.196 -0.212 0.028 -0.053 -0.034 -0.332 -0.133
Comments:
Below, the effects are plotted and a summary table is generated.
# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'SEX'+geom_line(lwd=2)+
labs(x="Gender",y="log(SOL + .05)")+ggtitle("")+
# geom_point(data=sleep.long,aes(BDI.II,SOL),alpha=.3)+ylim(0,3)+xlim(0,35)+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff")$'BDI.II'+geom_line(lwd=2)+
labs(x="BDI-II scores",y="log(SOL + .05)")+ggtitle("")+
geom_point(data=sleep.long,aes(BDI.II,SOL),alpha=.3)+ylim(0,3)+xlim(0,35)+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff")$'ALCOL_week'+geom_line(lwd=2)+
labs(x="Weekly alcohol consumption (alcoholic units)",y="log(SOL + .05)")+ggtitle("")+
geom_point(data=sleep.long,aes(ALCOL_week,SOL),alpha=.3)+ylim(-.5,2)+xlim(0,13)+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff")$'CAFFE_week'+geom_line(lwd=2)+
labs(x="Weekly coffee consumption (n. of cups)",y="log(SOL + .05)")+ggtitle("")+
geom_point(data=sleep.long,aes(CAFFE_week,SOL),alpha=.3)+ylim(-.5,3)+xlim(0,35)+
theme(text=element_text(size=20,face="bold",family="time"))
# output table
SOL <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
"substances (alcohol)","substances (smoke)","substances (coffee)"),
rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
rbind(A[order(A$df),],NA,NA),
round(coef(summary(sub)),2))
kable(SOL,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Null | 3 | 2219.37 | 0.11 | 4.08 | 1.40 | 0.53 | 2.62 | ||
| Week time | 0.17 | 0.68 | 4 | 2221.20 | 0.04 | 10.18 | -0.05 | 0.14 | -0.34 |
| Sex (males) | 2.39 | 0.12 | 5 | 2220.81 | 0.05 | 8.37 | 0.60 | 0.24 | 2.54 |
| MEQ-r | 2.06 | 0.15 | 6 | 2220.75 | 0.05 | 8.13 | -0.05 | 0.03 | -1.55 |
| PSQI | 0.19 | 0.66 | 7 | 2222.57 | 0.02 | 20.19 | -0.03 | 0.05 | -0.73 |
| BDI | 6.07 | 0.01 | 8 | 2218.50 | 0.17 | 2.64 | 0.04 | 0.02 | 2.16 |
| NAP | 1.35 | 0.24 | 9 | 2219.14 | 0.12 | 3.63 | 0.24 | 0.19 | 1.22 |
| substances (alcohol) | 8.59 | 0.04 | 12 | 2216.56 | 0.44 | 1.00 | -0.09 | 0.04 | -2.47 |
| substances (smoke) | -0.13 | 0.25 | -0.53 | ||||||
| substances (coffee) | 0.03 | 0.01 | 2.16 |
Conclusions:
The model comparison suggested a positive relationship between SOL and depressive symptoms, a negative relationship between SOL and the weekly amount of alcoholic units, a positive relationship between SOL and the weekly amount of coffees, and a higher SOL among smokers compared to non-smokers. Moreover, males showed shorted SOL than females.
WASO is modeled assuming a normal distribution for residuals and by excluding participant MZ53 (see previous sections).
# excluding missing observations (10 rows)
clean <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r)&
sleep.long!="MZ53",]
# null model
null <- lmer(WASO~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(WASO~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(WASO~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(WASO~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
None of the specified models shows a significantly higher logLikelihood than the null model.
Consistently, the null model shows the lowest AIC (AICw = .40), with model 1 (including only week.time) showing the second lowest AIC (AICw = .35, dAIC = 1.14). The dAIC suggests that the null model is only slightly more plausible than model 1.
# output table
WASO <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
round(LRT[,c("Chisq","Pr(>Chisq)")],2),
A[order(A$df),],
rbind(round(coef(summary(null)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(WASO,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Null | 3 | 5050.04 | 0.46 | 1.00 | 51.71 | 2.02 | 25.62 | ||
| Week time | 1.24 | 0.27 | 4 | 5050.80 | 0.32 | 1.46 | |||
| Sex (males) | 0.01 | 0.93 | 5 | 5052.79 | 0.12 | 3.96 | |||
| MEQ-r | 0.77 | 0.38 | 6 | 5054.02 | 0.06 | 7.32 | |||
| PSQI | 0.05 | 0.82 | 7 | 5055.97 | 0.02 | 19.39 | |||
| BDI | 0.56 | 0.46 | 8 | 5057.42 | 0.01 | 40.04 | |||
| NAP | 0.00 | 0.99 | 9 | 5059.41 | 0.00 | 108.31 | |||
| substances | 4.07 | 0.25 | 12 | 5061.34 | 0.00 | 284.29 |
Let’s see what would have been the results if MZ53 was not excluded.
# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$WASO)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
# null model
null <- lmer(WASO~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(WASO~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(WASO~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(WASO~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
summary(sub)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## WASO ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 5163.7 5215.8 -2569.8 5139.7 559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4659 -0.5801 -0.0421 0.5483 4.3164
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 337.2 18.36
## Residual 354.9 18.84
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 36.9767 11.0773 3.338
## week.timewe 1.1276 1.7456 0.646
## SEXM 4.9654 4.9445 1.004
## MEQ.r 0.8854 0.6392 1.385
## PSQI 0.1044 0.9762 0.107
## BDI.II -0.3125 0.3505 -0.892
## NAP1 -0.4684 2.4681 -0.190
## ALCOL_week -0.7457 0.7476 -0.997
## FUMO..SI.NO.1 11.4342 5.2566 2.175
## CAFFE_week 0.3529 0.2937 1.201
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.047
## SEXM -0.212 -0.001
## MEQ.r -0.808 0.000 0.028
## PSQI -0.403 -0.001 0.151 0.040
## BDI.II -0.124 0.001 -0.149 0.140 -0.547
## NAP1 -0.024 0.056 0.025 -0.001 0.001 -0.020
## ALCOL_week -0.031 0.000 -0.445 0.026 -0.130 0.185 -0.012
## FUMO..SI.NO -0.168 -0.002 0.044 0.163 0.062 -0.095 -0.023 -0.291
## CAFFE_week -0.042 -0.002 0.197 -0.212 0.028 -0.054 -0.021 -0.332 -0.134
Comments:
Conclusions:
None of the included predictors exerted a significant effect on WASO. The effect of Smoke was not substantial, and mainly due to participant MZ53.
BT is modeled assuming a normal distribution for residuals and including all participants (see previous sections).
# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$BT)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
# null model
null <- lmer(BT~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(BT~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(BT~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(BT~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(BT~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of participants’ sex over week time, a significant contribution of MEQ-r score over week time and sex, and a significant contribution of substances’ effect over the other predictors.
Coherently, the last model (with the effect of substances) shows the lowest AIC (AICw = .55), with the model with MEQ-r showing the second lowest AIC (AICw = .30). The dAIC suggests that the former is about 2 times more plausible than the latter.
Now, let’s proceed by evaluating the output of the selected model.
summary(sub)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## BT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 2063.6 2115.8 -1019.8 2039.6 559
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3411 -0.5607 -0.0853 0.4495 7.4691
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.6541 0.8087
## Residual 1.7316 1.3159
## Number of obs: 571, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 26.359300 0.535306 49.242
## week.timewe 0.688773 0.121918 5.649
## SEXM 0.182231 0.238626 0.764
## MEQ.r -0.143328 0.030829 -4.649
## PSQI 0.002443 0.047078 0.052
## BDI.II 0.001894 0.016912 0.112
## NAP1 0.012768 0.168771 0.076
## ALCOL_week 0.071953 0.036051 1.996
## FUMO..SI.NO.1 0.420832 0.253573 1.660
## CAFFE_week 0.002933 0.014161 0.207
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.067
## SEXM -0.211 -0.001
## MEQ.r -0.807 0.000 0.027
## PSQI -0.403 -0.001 0.151 0.040
## BDI.II -0.124 0.001 -0.151 0.140 -0.546
## NAP1 -0.034 0.055 0.035 -0.001 0.002 -0.029
## ALCOL_week -0.031 0.001 -0.445 0.026 -0.130 0.186 -0.018
## FUMO..SI.NO -0.168 -0.002 0.043 0.164 0.063 -0.094 -0.033 -0.290
## CAFFE_week -0.041 -0.002 0.196 -0.212 0.028 -0.054 -0.029 -0.332 -0.133
Comments:
BT is predicted by the week time, with weekend days showing later BT compared to week days. Moreover, BT was predicted by the MEQ-r score, with earlier BT being predicted by higher scores (i.e., indicating morning circadian preferences). Finally, weekly alcohol consumption shows a weaker positive effect on BT.
the effect of sex (which is positive, suggesting later BT for males) is not substantial when considering all other predictors.
Below, the effects are plotted and a summary table is generated.
# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'SEX'+
labs(x="Gender",y="Bed time")+ggtitle("")+
scale_x_continuous(breaks=c(1,2),labels=c("Females","Males"),limits=c(.75,2.25))+
scale_y_continuous(labels=c("00:48","01:00","01:12","01:24","01:36"))+
geom_segment(x=1,xend=2,y=25.11,yend=25.29,lwd=1.5,lty=2)+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'week.time'+
labs(x="Week time",y="Bed time")+ggtitle("")+
scale_x_continuous(breaks=c(1,2),labels=c("Weekend","Weekdays"),limits=c(.75,2.25))+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff")$'MEQ.r'+geom_line(lwd=2)+
labs(x="MEQ-r scores",y="Bedtime (hours from 00:00)")+ggtitle("")+
geom_point(data=sleep.long,aes(MEQ.r,BT),alpha=.3)+ylim(23.5,27)+xlim(5,20)+
theme(text=element_text(size=20,face="bold",family="time"))
# output table
BT <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
"substances (alcohol)","substances (smoke)","substances (coffee)"),
rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
rbind(A[order(A$df),],NA,NA),
rbind(round(coef(summary(MEQ.r)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(BT,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Null | 3 | 2111.82 | 0.00 | 4.196073e+10 | 26.78 | 0.45 | 59.54 | ||
| Week time | 30.94 | 0.00 | 4 | 2082.87 | 0.00 | 2.169853e+04 | 0.69 | 0.12 | 5.65 |
| Sex (males) | 3.84 | 0.05 | 5 | 2081.03 | 0.00 | 8.647280e+03 | 0.43 | 0.23 | 1.88 |
| MEQ-r | 20.14 | 0.00 | 6 | 2062.90 | 0.44 | 1.000000e+00 | -0.15 | 0.03 | -4.78 |
| PSQI | 0.04 | 0.83 | 7 | 2064.85 | 0.17 | 2.650000e+00 | |||
| BDI | 0.02 | 0.89 | 8 | 2066.83 | 0.06 | 7.130000e+00 | |||
| NAP | 0.06 | 0.81 | 9 | 2068.77 | 0.02 | 1.882000e+01 | |||
| substances (alcohol) | 11.14 | 0.01 | 12 | 2063.63 | 0.31 | 1.440000e+00 | |||
| substances (smoke) | |||||||||
| substances (coffee) |
Conclusions:
The model’s comparison suggested a negative relationship between BT and the score on the MEQ-r, a positive relationship between BT and the weekly amount of consumed alcoholic units, and later BT during the weekend compared to weekdays.
WT is modeled assuming a normal distribution for residuals and including all participants (see previous sections). Note that for testing the effects of week.time on this variable, we should use WT.dayAfter, which is referred to the subsequent day (e.g., WT on day 7 is Monday WT). In this way, Monday to Friday are considered weekdays, whereas Saturday and Sunday are considered weekend days.
# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$WT)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
!is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
clean$WT <- clean$WT.dayAfter # using the WT referred to the subsequent day
# null model
null <- lmer(WT.dayAfter~(1|ID),
data=clean,REML=FALSE)
# predictors
week.time <- lmer(WT.dayAfter~week.time+(1|ID),
data=clean,REML=FALSE)
SEX <- lmer(WT.dayAfter~week.time+SEX+(1|ID),
data=clean,REML=FALSE)
MEQ.r <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+(1|ID),
data=clean,REML=FALSE)
PSQI <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+(1|ID),
data=clean,REML=FALSE)
BDI <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
data=clean,REML=FALSE)
NAP <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
data=clean,REML=FALSE)
sub <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
data=clean,REML=FALSE)
All models converge with no problems. We can proceed with the model comparison.
LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
Comments:
The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of the MEQ-r score over week time and sex, and a significant contribution of substances’ effect over the other predictors.
Coherently, the last model (with the effect of substances) shows the lowest AIC (AICw = .59), with the model with MEQ-r showing the second lowest AIC (AICw = .26). The dAIC suggests that the former is about 2 times more plausible than the latter.
Now, let’s proceed by evaluating the output of the selected model.
summary(sub)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## WT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +
## FUMO..SI.NO. + CAFFE_week + (1 | ID)
## Data: clean
##
## AIC BIC logLik deviance df.resid
## 1877.0 1929.0 -926.5 1853.0 553
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9786 -0.6070 -0.0609 0.5764 4.0169
##
## Random effects:
## Groups Name Variance Std.Dev.
## ID (Intercept) 0.3716 0.6096
## Residual 1.3315 1.1539
## Number of obs: 565, groups: ID, 82
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 10.217482 0.424172 24.088
## week.timewe 0.892776 0.107398 8.313
## SEXM -0.173819 0.189146 -0.919
## MEQ.r -0.141341 0.024402 -5.792
## PSQI 0.007015 0.037254 0.188
## BDI.II -0.008330 0.013381 -0.623
## NAP1 -0.017482 0.147722 -0.118
## ALCOL_week 0.054181 0.028495 1.901
## FUMO..SI.NO.1 0.380333 0.201060 1.892
## CAFFE_week -0.001215 0.011184 -0.109
##
## Correlation of Fixed Effects:
## (Intr) wk.tmw SEXM MEQ.r PSQI BDI.II NAP1 ALCOL_ FUMO..
## week.timewe -0.073
## SEXM -0.208 -0.004
## MEQ.r -0.807 -0.003 0.027
## PSQI -0.404 -0.003 0.149 0.042
## BDI.II -0.124 0.005 -0.155 0.139 -0.544
## NAP1 -0.035 0.054 0.036 -0.006 0.001 -0.028
## ALCOL_week -0.032 0.003 -0.447 0.026 -0.129 0.189 -0.018
## FUMO..SI.NO -0.172 -0.005 0.039 0.167 0.068 -0.092 -0.040 -0.286
## CAFFE_week -0.040 -0.003 0.197 -0.211 0.027 -0.055 -0.033 -0.332 -0.133
Comments:
Below, the effects are plotted and a summary table is generated.
# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'week.time'+
labs(x="Week time",y="Weak time (hours from 00:00)")+ggtitle("")+
scale_x_continuous(breaks=c(1,2),labels=c("Weekend","Weekdays"),limits=c(.75,2.25))+
theme(text=element_text(size=20,face="bold",family="time"))
plot_model(sub, type = "eff")$'MEQ.r'+geom_line(lwd=2)+
labs(x="MEQ-r scores",y="Weak time (hours from 00:00)")+ggtitle("")+
geom_point(data=sleep.long,aes(MEQ.r,WT),alpha=.3)+ylim(6,10.5)+xlim(5,20)+
theme(text=element_text(size=20,face="bold",family="time"))
# output table
WT <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
"substances (alcohol)","substances (smoke)","substances (coffee)"),
rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
rbind(A[order(A$df),],NA,NA),
rbind(round(coef(summary(MEQ.r)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(WT,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
| Model | Chisq | Pr(>Chisq) | df | AIC | AICw | dAIC | Estimate | Std. Error | t value |
|---|---|---|---|---|---|---|---|---|---|
| Null | 3 | 1963.99 | 0.00 | 9.473215e+18 | 10.43 | 0.36 | 29.24 | ||
| Week time | 64.64 | 0.00 | 4 | 1901.35 | 0.00 | 2.368068e+05 | 0.89 | 0.11 | 8.33 |
| Sex (males) | 0.18 | 0.67 | 5 | 1903.17 | 0.00 | 5.883045e+05 | 0.01 | 0.18 | 0.07 |
| MEQ-r | 28.57 | 0.00 | 6 | 1876.60 | 0.41 | 1.000000e+00 | -0.14 | 0.02 | -5.85 |
| PSQI | 0.01 | 0.91 | 7 | 1878.59 | 0.15 | 2.700000e+00 | |||
| BDI | 0.69 | 0.41 | 8 | 1879.90 | 0.08 | 5.210000e+00 | |||
| NAP | 0.02 | 0.90 | 9 | 1881.89 | 0.03 | 1.408000e+01 | |||
| substances (alcohol) | 10.90 | 0.01 | 12 | 1876.99 | 0.34 | 1.220000e+00 | |||
| substances (smoke) | |||||||||
| substances (coffee) |
Conclusions:
The model comparison suggested a negative relationship between WT and the score on the MEQ-r, a positive relationship between WT and the weekly amount of consumed alcoholic units, and later BT during the weekend compared to weekdays.